COLT: a progress report

نویسنده

  • Gisle Andersen
چکیده

Phase 1 The COLT material was collected in London by a research team at the University of Bergen in 1993. It consists of roughly half-a-million words of spontaneous conversations between 13to 17-year old boys and girls from socially different school districts. During the period 1994–95, the conversations were transcribed orthographically (including indication of pauses and overlapping speech) by transcribers engaged by the Longman Group, and tagged for word-classes by a team at Lancaster university. In this form, COLT has become part of the British National Corpus (BNC). A demo of this version of COLT has been made available on Internet. At this point, the entire corpus has been checked and edited by the team in Bergen. The frequent occurrence of labels and the numerous instances of a question mark for speaker identity in the original transcripts indicate that the transcribers were faced with considerable problems. During our checking process, a great many instances of have disappeared, most of the speakers have been identified (with a substitution of the original names by fictitious ones), and mistakes in the original transcription have been straightened out. As a result, we have not only ended up with a transcription that is more faithful to the tape-recordings but also with a larger corpus; the number of words has increased by at least 15 per cent. This, in turn, has had the effect that the original word class tagging has become partly inadequate and that the edited corpus will have to be retagged. The retagging, which will be done by means of the Xanthippe software with assistance from Lancaster university, will be completed in the early ICAME Journal No. 20

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilateral Polydactyly in a foal

The following case report describes the diagnosis and surgery of bilateral polydactyly of unknown origin in a colt. A 7-month-old Berber colt was referred for cosmetic and curative excision of supernumerary digits. Radiographic examination revealed bilateral polydactyly and welldeveloped first carpal bones. Surgery consisted of an osteotomy of both second metacarpal bones combined with an amput...

متن کامل

Draft Genome Sequence of Agrobacterium sp. Strain UHFBA-218, Isolated from Rhizosphere Soil of Crown Gall-Infected Cherry Rootstock Colt

We report here the draft genome sequence of the alphaproteobacterium Agrobacterium sp. strain UHFBA-218, which was isolated from rhizosphere soil of crown gall-infected cherry rootstock Colt. The draft genome of strain UHFBA-218 consists of 112 contigs (5,425,303 bp) and 5,063 coding sequences with a G+C content of 59.8%.

متن کامل

Surgical treatment and a unique management of rostral mandibular fracture with cerclage wire in a horse

A 3-year-old Arabian colt was presented for a major gingiva wound at the right rostral part of mandible. After clinical assessments, rostral mandibular fracture was determined. Stabilization of fractured region was achieved via cerclage wire application under general anesthesia. Fixation wires were left in place for 6 weeks. A 3 -month follow up revealed complete fracture healing. The purpose o...

متن کامل

Learning Theory, 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy, June 27-30, 2005, Proceedings

Thank you for reading learning theory 18th annual conference on learning theory colt 2005 bertinoro italy june 27 3

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997